System Neuroscience - Sound localization

This is a Jupyter Notebook. The code you see is written in Python 3.

Audio samples in this Notebook are to be played via headphones or earphones to enable perception of binaural sound localization cues.

WARNING:
The Audio-Elements in Jupyter Notebooks may play very loud sounds.
You should turn down the volume in your operating system (Windows/macOS) or the media volume when using a phone.
Increase volume stepwise, starting with a low level until you can hear the audio samples just fine.
Be careful to avoid noise damage or discomfort!

Just view it and play Audio samples:

When you open the Notebook in a suitable environment that renders it correctly, the output <Audio> elements should be shown in your browser right away as they are saved together with the Notebook.

If you have no Python/Jupyter installed on your computer, just visit:

https://colab.research.google.com/github/penalab/systems-sound-localization/blob/master/Sys_Audio_Class.ipynb

A "static" export of this notebook is also available (but may be removed without further notice in the future):

https://elpy.de/class/Sys_Audio_Class.html

Play with parameters:

If you want to play around with the code and change parameters, there are two alternatives:

Google's Colaboratory

Requires: Google account

  1. Login to your Google Account
  2. Visit the link to the Notebook in Google's "Colaboratory":
    https://colab.research.google.com/github/penalab/systems-sound-localization/blob/master/Sys_Audio_Class.ipynb
  3. On the top-right, click "Connect" (or the little arrow right of it) and then "Connect to hosted runtime".
  4. Now you can "run" (mean: execute) the Python code of the individual code cells by clicking on the "Play" icon in the top-left corner of any cells containing Python code. After running the first code cell (the one with several lines starting with import near the beginning) once, you are free to play around.
  5. You can save your changes into your personal Google Drive or download the Notebook to your computer and run it locally (assuming you have Python installed).

Execute Jupyter locally

Requires: Python 3 installed on your computer, plus at least numpy, jupyter and all of their dependencies. All of this comes bundles in the Anaconda Distribution. Make sure you have Python 3.6 or 3.7.

You can then download the Notebook file Sys_Audio_Class.ipynb from Github, place it in some directory where you find it in Jupyter and run it as usual.

Notes

In between the Python Code cells (each having an input area with code and, possibly, an output area), there may be text areas (like this one). These are Markdown cells. If you accidentally start editing one of these, is looks less like nicely formatted text but more like a colored text editor. You can "execute" those cells in order to recover the formatted version.

What parameters to change?

There are two functions noise() and tone() which are used to generate the examples below.

Both functions display a small Audio element that you can use to play the generated sounds.

To use one of the functions, just write their name followed imidiately by parenthesis, like this:

noise()

or

tone()

You can change parameters to non-default values by indicating this inside of the parenthesis. For example a tone with 1000 Hz instead of the default 440 Hz is generated like this:

tone(freq = 1000)

Do not include the units of the parameters. Only numbers are allowed. Decimals like 123.45 are usually allowed as well.

All parameter names are case sensitive (use all-lower-case).

noise() parameters

parameter default value description
min_freq 20 Lower end of the frequency range (pass-band) of the noise in Hz, must be positive.
max_freq 20000 Upper end of the frequency range (pass-band) of the noise in Hz, must be greater than min_freq and should not be above 24000.
itd 0 Interaural time difference in microseconds (µs).
ild 0 Interaural level difference in decibel (dB).
bc +100 Binaural correlation in percent, between -100 and +100.
length 1.0 Length of the signal in seconds, keep it reasonable.
level 0 Relative level in dB, can be negative or positive to decrease or increase the level. If you see a warning indicating that clipping occurred, the level is too high. The maximum value also depends on the actual ILD (and vice versa).

For each parameter that you don't indicate explicitely, the default value will be used. In other words, using the noise() function without parameters is equivalent to (explicitely) setting all parameters to their default values:

noise(min_freq = 20, max_freq = 20000, itd = 0, ild = 0, bc = 100, length = 1.0, level = 0)

tone() parameters

parameter default value description
freq 440 Frequency of the tone in Hz.
itd 0 Interaural time difference in microseconds (µs).
ild 0 Interaural level difference in decibel (dB).
length 1.0 Length of the signal in seconds, keep it reasonable.
level 0 Relative level in dB, can be negative or positive to decrease or increase the level. If you
tone(freq = 440, itd = 0, ild = 0, bc = 100, length = 1.0, level = 0)

The other parameters of noise() and tone(), such as ramp_dur, pad_length, show_player and return_signal, are not actually needed to be changed in the sense of this class. But you are free to do so... maybe to learn more about Python.

In [1]:
#####################################################
## This code cell must be executed once before the ##
## others. It defines all the necessary functions. ##
#####################################################

import numpy as np
from IPython.display import Audio, HTML
import struct
from io import BytesIO
import wave

# Default sampling rate:
samplingrate = 48000

def db2amp(d):
    return 10 ** (float(d) / 20)
def amp2db(f):
    return 20 * np.log10(f)

def us2s(microseconds):
    return microseconds * 1e-6
def s2us(seconds):
    return seconds * 1e6

def ms2s(m_seconds):
    return m_seconds * 1e-3
def s2ms(seconds):
    return seconds * 1e3

def times(signal):
    if isinstance(signal, (tuple, list)):
        signal = signal[0]
    return np.arange(signal.size)/samplingrate

def signal_energy(signal, fs=100e3):
    #signal = signal.astype(np.float64)
    #print np.min(signal), np.max(signal)
    return np.sum(signal.astype(np.float64) ** 2) / fs

def signal_power(signal, fs=100e3):
    return signal_energy(signal, fs=fs) / signal.size

def band_limits(center=8500, octave_width=1.0):
    return float(center) * 2**(-.5*octave_width), float(center) * 2**(+.5*octave_width)

def to_stereo(signal):
    """Return a tuple of left and right signal
    A single signal will be duplicated.
    A stereo signal will be returned unchanged.
    """
    try:
        signal_L, signal_R = signal
        #print "Play: Stereo"
    except ValueError:
        signal_L = signal
        signal_R = signal
        #print "Play: Mono"
    return (signal_L, signal_R)

def ramp(x = None, start = 0.0, stop = 1.0, ramp_dur = .05, fs = None, amp = 1.0):
    """ramp(x = np.ones(fs), start = .400, stop = .500, ramp_dur = .05, fs = fs, amp = 1.0)
    
    Ramp the signal x by linaer ramps between start and stop
    
    Examples
    ========
    
    >>> ramp()
    returns the default ramp function as it generates internally data x as all ones for 1 s with fs.
    """
    if fs is None:
        fs = samplingrate
    if x is None:
        x = np.ones(int(fs))
    t1 = start
    t2 = (start + ramp_dur)
    t3 = (stop - ramp_dur)
    t4 = stop
    t = np.arange(x.size) / fs
    r = amp * np.piecewise(t, [(t1 <= t) & (t < t2), (t2 <= t) & (t < t3), (t3 <= t) & (t < t4)],
                         [lambda t: (t - t1) / (t2-t1), 1, lambda t: 1 - (t - t3) / (t4-t3), 0])
    return r * x

def bandpass_itd_noise(min_freq = 20, max_freq = 20000, itd=0, fs=None, samples = None):
    """bandpass_itd_noise()
    """
    # Calculate available fft frequencies:
    if fs is None:
        fs = samplingrate
    if samples is None:
        samples = int(fs)
    freqs = np.fft.rfftfreq(samples, 1/fs)
    f = np.zeros_like(freqs)
    idx = np.where(np.logical_and(freqs>=min_freq, freqs<=max_freq))[0]
    f[idx] = 1 * np.sqrt(.5 * samples) * (f.size / np.count_nonzero(idx))**.5
    f = np.array(f, dtype='complex')

    phases_rad = np.random.rand(len(f)) * 2 * np.pi
    phases = np.cos(phases_rad) + 1j * np.sin(phases_rad)

    itd_phas_rad = us2s(itd) * np.pi * freqs #  * .5 * 2
    shift_left = np.cos(-itd_phas_rad) + 1j * np.sin(-itd_phas_rad)
    shift_right = np.cos(+itd_phas_rad) + 1j * np.sin(+itd_phas_rad)
    phases_left = phases * shift_left
    phases_right = phases * shift_right

    f_left = f * phases_left
    f_right = f * phases_right

    s_left = np.fft.irfft(f_left).real
    s_right = np.fft.irfft(f_right).real
    return s_left, s_right

def do_ild(signal, ild = 0):
    signal_L, signal_R = to_stereo(signal)
    amp_L = db2amp(-.5 * ild)
    amp_R = db2amp(+.5 * ild)
    return amp_L * signal_L, amp_R * signal_R

def do_bc(signal, bc = 100, **noise_kwargs):
    signal_L, signal_R = to_stereo(signal)
    if bc != 100:
        if bc < 0:
            signal_R = -signal_R
        r = abs(bc / 100)
        _, decorr_noise_L = bandpass_itd_noise(itd=0, **noise_kwargs)
        _, decorr_noise_R = bandpass_itd_noise(itd=0, **noise_kwargs)
        # Adjust powers:
        decorr_noise_L = decorr_noise_L / (signal_power(decorr_noise_L) / signal_power(signal_L))**.5
        decorr_noise_R = decorr_noise_R / (signal_power(decorr_noise_R) / signal_power(signal_R))**.5
        signal_L = r**.5 * signal_L + (1 - r)**.5 * decorr_noise_L
        signal_R = r**.5 * signal_R + (1 - r)**.5 * decorr_noise_R
    return signal_L, signal_R

def stack(*signals):
    n = len(signals)
    signal_stack = tuple(np.sum(np.stack([s[c] for s in signals]), axis=0) / np.sqrt(n) for c in (0, 1))
    return signal_stack

def pscale(signal, rel_power=1.0):
    scaled_signal = tuple(signal[c] * np.sqrt(rel_power) for c in (0, 1))
    return scaled_signal

def player(signal, rel_abi=0):
    # Make it stereo and stack in one 
    signal = np.vstack(to_stereo(signal))
    scaled = signal.T.ravel() * 1024 * db2amp(rel_abi)
    if np.max(np.abs(scaled)) > 32767:
        scale_by = 32767 / np.max(np.abs(scaled))
        scaled = scaled * scale_by
        import sys
        print(f"Signal would have clipped. Scaled (down) by {amp2db(scale_by):.1g} dB.", file=sys.stderr)
    fp = BytesIO()
    waveobj = wave.open(fp,mode='wb')
    waveobj.setnchannels(2)
    waveobj.setframerate(samplingrate)
    waveobj.setsampwidth(2)
    waveobj.setcomptype('NONE','NONE')
    waveobj.writeframes(b''.join([struct.pack('<h',x) for x in np.int16(scaled)]))
    pmc_data = fp.getvalue()
    waveobj.close()
    display(Audio(pmc_data, rate=samplingrate, autoplay=False))

def save_wave(signal, filename, rel_abi=0):
    # Make it stereo and stack in one 
    signal = np.vstack(to_stereo(signal))
    scaled = signal.T.ravel() * 1024 * db2amp(rel_abi)
    if np.max(np.abs(scaled)) > 32767:
        scale_by = 32767 / np.max(np.abs(scaled))
        scaled = scaled * scale_by
        import sys
        print(f"Signal would have clipped. Scaled (down) by {amp2db(scale_by):.1g} dB.", file=sys.stderr)
    with open(filename, 'wb') as fp:
        waveobj = wave.open(fp,mode='wb')
        waveobj.setnchannels(2)
        waveobj.setframerate(samplingrate)
        waveobj.setsampwidth(2)
        waveobj.setcomptype('NONE','NONE')
        waveobj.writeframes(b''.join([struct.pack('<h',x) for x in np.int16(scaled)]))
        waveobj.close()

def noise(min_freq = 20, max_freq = 20000, itd=0, ild=0, bc=100, length=1.0, level=0, ramp_dur=0.05, pad_length=0.005, show_player=True, return_signal=False):
    samples = int((length + 2 * pad_length) * samplingrate)
    signal_LR = bandpass_itd_noise(min_freq, max_freq, itd, samplingrate, samples)
    signal_LR = do_bc(signal_LR, bc=bc, min_freq=min_freq, max_freq=max_freq, fs=samplingrate, samples=samples)
    signal_LR = do_ild(signal_LR, ild)
    signal_LR = [ramp(s, start=pad_length, stop=length+pad_length, ramp_dur=ramp_dur) for s in signal_LR]
    if show_player:
        display(HTML(f'<p><strong>Noise</strong> {min_freq}-{max_freq} Hz, '
                     f'ITD = {itd} μs, ILD = {ild} dB, BC = {bc}%'
                     '</p><style type="text/css">audio {min-width: 400px; }</style>'))
        player(signal_LR, rel_abi=level)
    if return_signal:
        return signal_LR

def tone(freq = 440, itd=0, ild=0, bc=100, length=1.0, level=0, ramp_dur=0.05, pad_length=0.005, show_player=True, return_signal=False, **bc_noise_kwargs):
    samples = int((length + 2 * pad_length) * samplingrate)
    t = np.arange(0, samples) / samplingrate
    phase = np.random.rand(1) * 2 * np.pi
    shift = us2s(itd) * np.pi * freq
    signal_LR = (
        np.sin(t * 2 * np.pi * freq + phase - shift),
        np.sin(t * 2 * np.pi * freq + phase + shift)
    )
    signal_LR = do_bc(signal_LR, bc=bc, fs=samplingrate, samples=samples, **bc_noise_kwargs)
    signal_LR = do_ild(signal_LR, ild)
    signal_LR = [ramp(s, start=pad_length, stop=length+pad_length, ramp_dur=ramp_dur) for s in signal_LR]
    if show_player:
        display(HTML(f'<p><strong>Tone</strong> {freq} Hz, '
                     f'ITD = {itd} μs, ILD = {ild} dB, BC = {bc}%'
                     '</p><style type="text/css">audio {min-width: 400px; }</style>'))
        player(signal_LR, rel_abi=level)
    if return_signal:
        return signal_LR

def tone_stack(freqs = np.arange(500, 12000, 1000), show_player=True, return_signal=False, show_single_player=False, **tone_kwargs):
    tone_kwargs.update(show_player=show_single_player)
    tone_kwargs.update(return_signal=True)
    signal_LR = stack(*[tone(freq=f, **tone_kwargs) for f in freqs])
    if show_player:
        display(HTML(f'<p><strong>Tone-Stack</strong> {freqs} Hz, '
                     f'ITD = {tone_kwargs.get("itd",0)} μs, ILD = {tone_kwargs.get("ild",0)} dB, BC = {tone_kwargs.get("bc",100)}%'
                     '</p><style type="text/css">audio {min-width: 400px; }</style>'))
        player(signal_LR)
    if return_signal:
        return signal_LR

Localize Broadband Noise

The following two cells include each several 0.5-second (500ms) broadband noise samples.

ITD

In the first cell, the interaural time differnce (ITD) was varied.

By convention, negative ITDs indicate that the sound presented to the left ear is leading, positive ITDs indicate that the sound presented to the right ear is leading.

This kind of sound that include "only" ITD information is usually perceived "inside the head", i.e. it is internalized. However, sound with different ITDs should be clearly shifted inside the head according to the ITD value.

Listen to the different sound samples - it's easiest to start with the one that has ITD = 0 µs. Check your own ITD range and correlate perceptual effects with ITD values.

In [2]:
for itd in [-4800, -2400, -1200, -900, -750, -600, -450, -300, -150, 0, 150, 300, 450, 600, 750, 900, 1200, 2400, 4800]:
    noise(itd=itd, length=.5)

Noise 20-20000 Hz, ITD = -4800 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = -2400 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = -1200 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = -900 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = -750 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = -600 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = -450 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = -300 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = -150 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 150 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 450 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 750 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 900 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 2400 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 4800 μs, ILD = 0 dB, BC = 100%

ILD

In the second cell, the interaural level differnce (ILD) was varied.

By convention, negative ILDs indicate that the sound presented to the left ear is louder, positive ILDs indicate that the sound presented to the right ear is louder.

Like ITD-only sounds, ILD-only sounds are internalized. The most extreme ILD values may actually sound like noise is presented to one ear alone.

In [3]:
for ild in [-40, -30, -20, -15, -10, -5, 0, 5, 10, 15, 20, 30, 40]:
    noise(ild=ild, length=.5)

Noise 20-20000 Hz, ITD = 0 μs, ILD = -40 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = -30 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = -20 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = -15 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = -10 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = -5 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 5 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 10 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 15 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 20 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 30 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 40 dB, BC = 100%

ITD detection varying frequency and bandwidth

The unambiguous detection of ITD depends on the frequencies available in a sound stimulus.

First, there is an upper limit of frequencies above which auditory nerve fibers cannot pass time (or phase) information. This limit is different between species.

Listen to the following pure tones, all at an ITD of +600 μs (right), with increasing frequencies.

Up to which frequency do you perceive the tone at the right side? At which frequencies do you perceive it more in the center or not localized at all?

In [4]:
for freq in [150, 300, 450, 600, 750, 900, 1050, 1200, 1350, 1500, 1800, 2100, 2400, 2700, 3000, 4000, 6000, 8000]:
    tone(freq=freq, itd=600, length=.5)

Tone 150 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 300 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 450 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 600 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 750 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 900 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 1050 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 1200 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 1350 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 1500 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 1800 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 2100 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 2400 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 2700 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 3000 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 4000 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 6000 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 8000 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

We can further explore our (lacking) capability of the to detect ITDs at high frequencies.

Let's first use some tones and narrow band noises in the low frequency range, i.e. at or around 500 Hz.

You should be able to localize these toward the left/right depending on the individual ITD.

In [5]:
cf = 500 # frequency or center frequency in Hz

for itd in [-500, -250, 0, 250, 500]:
    tone(freq=cf, itd=itd, length=.5)

bw = 400 # bandwidth in Hz
for itd in [-500, -250, 0, 250, 500]:
    noise(min_freq=cf-bw/2, max_freq=cf+bw/2, itd=itd, length=.5)

Tone 500 Hz, ITD = -500 μs, ILD = 0 dB, BC = 100%

Tone 500 Hz, ITD = -250 μs, ILD = 0 dB, BC = 100%

Tone 500 Hz, ITD = 0 μs, ILD = 0 dB, BC = 100%

Tone 500 Hz, ITD = 250 μs, ILD = 0 dB, BC = 100%

Tone 500 Hz, ITD = 500 μs, ILD = 0 dB, BC = 100%

Noise 300.0-700.0 Hz, ITD = -500 μs, ILD = 0 dB, BC = 100%

Noise 300.0-700.0 Hz, ITD = -250 μs, ILD = 0 dB, BC = 100%

Noise 300.0-700.0 Hz, ITD = 0 μs, ILD = 0 dB, BC = 100%

Noise 300.0-700.0 Hz, ITD = 250 μs, ILD = 0 dB, BC = 100%

Noise 300.0-700.0 Hz, ITD = 500 μs, ILD = 0 dB, BC = 100%

Using a higher (center) frequency, removes our capability to detect ITDs. Here we use 7000 Hz (7 kHz).

In [6]:
cf = 7000 # frequency or center frequency in Hz

for itd in [-500, -250, 0, 250, 500]:
    tone(freq=cf, itd=itd, length=.5)

bw = 400 # bandwidth in Hz
for itd in [-500, -250, 0, 250, 500]:
    noise(min_freq=cf-bw/2, max_freq=cf+bw/2, itd=itd, length=.5)

Tone 7000 Hz, ITD = -500 μs, ILD = 0 dB, BC = 100%

Tone 7000 Hz, ITD = -250 μs, ILD = 0 dB, BC = 100%

Tone 7000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 100%

Tone 7000 Hz, ITD = 250 μs, ILD = 0 dB, BC = 100%

Tone 7000 Hz, ITD = 500 μs, ILD = 0 dB, BC = 100%

Noise 6800.0-7200.0 Hz, ITD = -500 μs, ILD = 0 dB, BC = 100%

Noise 6800.0-7200.0 Hz, ITD = -250 μs, ILD = 0 dB, BC = 100%

Noise 6800.0-7200.0 Hz, ITD = 0 μs, ILD = 0 dB, BC = 100%

Noise 6800.0-7200.0 Hz, ITD = 250 μs, ILD = 0 dB, BC = 100%

Noise 6800.0-7200.0 Hz, ITD = 500 μs, ILD = 0 dB, BC = 100%

Ambigious signals, a central/frontal bias and its resolution

An interesting effect arises when we use a combination of frequencies and ITDs that leads to "false" impressions.

To experience this effect, we use a (center) frequency of 714 Hz - we've seen above that ITDs can be detected at this frequency range. You can try this here again. Start at ITD=0, then progressively go to higher ITDs.

What to you experience when you reach ITDs close to the period $T={1 \over f}$ of this frequency? Compare this to the negative ITDs.

Explanation (read after trying yourself): The periodic nature of tones leads to a "falsy" detection of ITD when the absolute values of ITD reach the period corresponding to a tone's frequency. When two tones (sine waves) of the same frequency are cross-correlated to determine the time lag between them (which is equal to the ITD), multiple ambiguous solutions are possible, apart from each other by the the period of the used frequency. The actual percept that arises is biased toward the front, i.e. the smallest absolute ITD value is assumed. For a frequency of 714 Hz, this mean that ITD = 0 μs and ITD = +1400 μs are effectively the same, as are -200 and 1200 μs, -400 and +1000 μs, -700 and +700 μs, and so on...

In [7]:
f = 714
print(f"The period of f = {f} Hz is approx. T = {1/f*1e6:.0f} μs")
for itd in [-400, -200, 0, 200, 400, 600, 800, 1000, 1200, 1400, 1600]:
    tone(freq=f, itd=itd)
The period of f = 714 Hz is approx. T = 1401 μs

Tone 714 Hz, ITD = -400 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = -200 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = 0 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = 200 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = 400 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = 600 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = 800 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = 1000 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = 1400 μs, ILD = 0 dB, BC = 100%

Tone 714 Hz, ITD = 1600 μs, ILD = 0 dB, BC = 100%

However, the ambiguities observed in pure tones and narrow band noises can be cancelled out by increasing the bandwidth, i.e. by "adding" more frequencies to the signal.

The next samples all use ITD = 1200 μs and a center frequency of 714 Hz. Start at the smallest bandwidth (here 1 Hz is effectively only a tone due to the method used to generate the stimuli) and experience how the perceived location of the stimulus changes with increasing bandwidth.

In [8]:
cf = 714
for bw in [1, 50, 150, 250, 350, 500, 800, 1000]:
    print(f"Bandwidth: {bw} Hz")
    noise(min_freq=cf-bw/2, max_freq=cf+bw/2, itd=1200, ramp_dur=.05)
Bandwidth: 1 Hz

Noise 713.5-714.5 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Bandwidth: 50 Hz

Noise 689.0-739.0 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Bandwidth: 150 Hz

Noise 639.0-789.0 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Bandwidth: 250 Hz

Noise 589.0-839.0 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Bandwidth: 350 Hz

Noise 539.0-889.0 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Bandwidth: 500 Hz

Noise 464.0-964.0 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Bandwidth: 800 Hz

Noise 314.0-1114.0 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Bandwidth: 1000 Hz

Noise 214.0-1214.0 Hz, ITD = 1200 μs, ILD = 0 dB, BC = 100%

Envelope ITD detection at high frequencies

We have seen before that it is not possible to detect ITDs at high frequency tones or even narrowband noises containing a range of high frequencies.

If we use a wide enough range of frequencies, however, the so called envelope (random changes in sound intensity) which always has a lower frequency, may be used to detect ITD.

Let's test this with just two ITDs: -400 μs (left) and +400 μs (right). Verify that you cannot hear any difference between the two ITDs of tones at 4 kHz and 8 kHz. Then see if the same is true for a noise containing all frequencies in between.

In [9]:
tone(freq=4000, itd=-400)
tone(freq=4000, itd=400)

tone(freq=8000, itd=-400)
tone(freq=8000, itd=400)

noise(min_freq=4000, max_freq=8000, itd=-400)
noise(min_freq=4000, max_freq=8000, itd=400)

Tone 4000 Hz, ITD = -400 μs, ILD = 0 dB, BC = 100%

Tone 4000 Hz, ITD = 400 μs, ILD = 0 dB, BC = 100%

Tone 8000 Hz, ITD = -400 μs, ILD = 0 dB, BC = 100%

Tone 8000 Hz, ITD = 400 μs, ILD = 0 dB, BC = 100%

Noise 4000-8000 Hz, ITD = -400 μs, ILD = 0 dB, BC = 100%

Noise 4000-8000 Hz, ITD = 400 μs, ILD = 0 dB, BC = 100%

Binaural Correlation

As mentionned earlier, the mechanism to detect ITDs is similar to the cross-correlation. As we have seen, the success of this detection depends on the frequencies contained in the signal - in a perfect world.

In nature (and in the lab), the correlation between the two signals arriving at (or presented to) the two ears may be impaired. This could be due to other sounds arriving at the same time or due to noise that reaches the ears nearly independently. Neuronal noise may also play a role in deteriorating the signal quality before cross-correlation can be computed in the brain (e.g. via the Jeffress model in the avian brain).

We can mimick the effect by adding independent noise to the signals presented to both ears. This independent noise carries no ITD information. The amount of noise added is expressed as binaural correlation (BC) where 100% indicate a perfectly correlated signal and 0% indicate that only indepentent noise is presented.

Below, you can experience different levels of binaural correlation. Start with the first three examples (BC = 100%) and work your way down to lower BC values. What do you experience? Can you still localize the signal at low BC values (distiguish the three different ITDs)

In [10]:
for bc in [100, 80, 60, 40, 20, 0]:
    print(f"Binaural Correlation {bc}%")
    noise(itd=-400, bc=bc)
    noise(itd=   0, bc=bc)
    noise(itd= 400, bc=bc)
Binaural Correlation 100%

Noise 20-20000 Hz, ITD = -400 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 400 μs, ILD = 0 dB, BC = 100%

Binaural Correlation 80%

Noise 20-20000 Hz, ITD = -400 μs, ILD = 0 dB, BC = 80%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 80%

Noise 20-20000 Hz, ITD = 400 μs, ILD = 0 dB, BC = 80%

Binaural Correlation 60%

Noise 20-20000 Hz, ITD = -400 μs, ILD = 0 dB, BC = 60%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 60%

Noise 20-20000 Hz, ITD = 400 μs, ILD = 0 dB, BC = 60%

Binaural Correlation 40%

Noise 20-20000 Hz, ITD = -400 μs, ILD = 0 dB, BC = 40%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 40%

Noise 20-20000 Hz, ITD = 400 μs, ILD = 0 dB, BC = 40%

Binaural Correlation 20%

Noise 20-20000 Hz, ITD = -400 μs, ILD = 0 dB, BC = 20%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 20%

Noise 20-20000 Hz, ITD = 400 μs, ILD = 0 dB, BC = 20%

Binaural Correlation 0%

Noise 20-20000 Hz, ITD = -400 μs, ILD = 0 dB, BC = 0%

Noise 20-20000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 0%

Noise 20-20000 Hz, ITD = 400 μs, ILD = 0 dB, BC = 0%

Time-intensity trade

We have seen that ITD and ILD can both be used alone to shift the perceived location of a signal to the left or to the right - while the other value is kept at zero.

Natually, ITD and ILD would both vary depending on the sound source location and their effect would be combined.

In the current artificial situation (using headphonse/earphonse) where we can control ITD and ILD independently, it is possible to set them to contradictory values (i.e. combinations that wouldn't normally occur with real sound sources). This helps to determine how much influence each of these cues actually has on the perceived location of a sound and how they are combined.

Here is a first sound with ITD and ILD set to zero to give you again a reference how a central/frontal stimulus sounds like:

In [11]:
noise(itd=0, ild=0)

Noise 20-20000 Hz, ITD = 0 μs, ILD = 0 dB, BC = 100%

All of the following samples have an ITD set to +300 μs. At the same time, they have a different ILDs, starting at 0 dB and progressively decreasing toward -15 dB.

Keep in mind that positive values of ITD and ILD are associated with stimuli on the right, negative values on the left.

Decrease the ILD step-by-step to move the stimulus back to the center/front. Which ILD "cancels out" the right shift given by the fixed ITD?

Note that even the most central/frontal combination will unlikely sound exactly the same as the reference above.

In [12]:
for ild in [0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15]:
    noise(itd=300, ild=ild)

Noise 20-20000 Hz, ITD = 300 μs, ILD = 0 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -1 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -2 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -3 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -4 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -5 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -6 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -7 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -8 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -9 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -10 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -11 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -12 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -13 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -14 dB, BC = 100%

Noise 20-20000 Hz, ITD = 300 μs, ILD = -15 dB, BC = 100%